Text categorization methods for automatic estimation of verbal intelligence
نویسندگان
چکیده
In this paper we investigate whether conventional text categorization methods may suffice to infer different verbal intelligence levels. This research goal relies on the hypothesis that the vocabulary that speakers make use of reflects their verbal intelligence levels. Automatic verbal intelligence estimation of users in a spoken language dialog system may be useful when defining an optimal dialog strategy by improving its adaptation capabilities. The work is based on a corpus containing descriptions (i.e. monologs) of a short film by test persons yielding different educational backgrounds and the verbal intelligence scores of the speakers. First, a one-way analysis of variance was performed to compare the monologs with the film transcription and to demonstrate that there are differences in the vocabulary used by the test persons yielding different verbal intelligence levels. Then, for the classification task, the monologs were represented as feature vectors using the classical TF-IDF weighting scheme. The Naive Bayes, /c-nearest neighbors and Rocchio classifiers were tested. In this paper we describe and compare these classification approaches, define the optimal classification parameters and discuss the classification results obtained.
منابع مشابه
Verbal intelligence identification based on text classification
This paper analyses and compares term weighting methods for automatic verbal intelligence identification from speech. Two different corpora are used; the first one contains monologues on the same topic; the second one contains dialogues between two or three people. The problem is described as a text classification task with two classes: low and high verbal intelligence. Seven different term wei...
متن کاملSpeech Data Corpus for Verbal Intelligence Estimation
The goal of our research is the development of algorithms for automatic estimation of a person’s verbal intelligence based on the analysis of transcribed spoken utterances. In this paper we present the corpus of German native speakers’ monologues and dialogues about the same topics collected at the University of Ulm, Germany. The monologues were descriptions of two short films; the dialogues we...
متن کاملOrganizing Digital Libraries by Automated Text Categorization∗
Text Categorization (TC) is the discipline concerned with the construction of automatic text classifiers, i.e. programs capable of assigning to a document one or more among a set of ∗This is an extended version of an invited paper presented by the third author at the Workshop on Artificial Intelligence for Cultural Heritage and Digital Libraries, co-located with the 7th Conference of the Italia...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Expert Syst. Appl.
دوره 39 شماره
صفحات -
تاریخ انتشار 2012